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Abstract 

An integer programming approach to item pool design is presented that can be used to 
calculate an optimal blueprint for an item pool to support an existing testing program. The 
results are optimal in the sense in that they minimize the efforts involved in actually 
producing the items as revealed by current item writing patterns. Also, an adaptation of 
the models for use as a set of monitoring tools in item pool management is presented. The 
approach is demonstrated empirically for an item pool designed for the Law School 
Admission Test (LSAT). 
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An Integer Programming Approach to Item Pool Design 

Recently, a variety of methods for automated assembly of test forms from an item 
pool have become available. Each of these methods can be classified as belonging to one 
of the following classes (van der Linden, 1998): (1) heuristics that select items 
sequentially to match a target for the test information function or to fit a weighted 
combination of the test specifications (e.g., Ackerman, 1992; Luecht, 1998; Sanders & 
Verschoor, 1998; Swanson & Stocking, 1993); (2) methods that model the test assembly 
problem as a 0-1 linear programming (LP) problem and then use a search algorithm to 
find a solution (e.g., Adema, Boekkooi-Timminga & van der Linden, 1991; Boekkooi- 
Timminga, 1987, 1990; Theunissen, 1985; Timminga & Adema, 1996; van der Linden, 
1994, 1996; van der Linden & Boekkooi-Timminga, 1989), (3) methods based on network- 
flow programming with Lagrange relaxation and/or embedding of the network model in a 
heuristic (Armstrong & Jones, 1992; Armstrong, Jones & Wang, 1994, 1995; Armstrong, 
Jones & Wu, 1992) and (4) methods based on optimal design theory from statistics (e.g., 
Berger, 1994). Detailed descriptions and examples of these methods are given in a recent 
special issue of Applied Psychological Measurement on optimal test assembly (van der 
Linden, 1998). 

These test assembly methods result in tests that are optimal or close to optimality. 
However, even when optimal, the results need not be satisfactory because an important 
constraint on the quality of the tests is imposed by the composition of the item pool. For 
example, an item pool can have enough items with the content attributes required by the 
test specifications but their statistical attributes may be off target in the sense that the 
items are too difficult or too easy. This case can easily occur if an item pool is frequently 
used and certain categories of items in the pool are depleted quickly. The result is then an 
optimal test with an information function too low on a relevant interval on the ability 
scale. Though the problem of item pool depletion is less likely to happen for larger item 
pools, it should not be inferred that larger item pools are necessarily optimal. On the 
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contrary, a well-known phenomenon in item pool management is that a considerable 
proportion of the items in the pool may never be used. The presence of such ’’wallflowers" 
can be the result of attribute values not needed by the test specifications or 
overrepresented in the pool. Since the costs of screening and pretesting items are generally 
high, items in either category typically involve a considerable loss of resources. 

This paper presents a integer-programming method for item pool design. The 
method results in a blueprint for an item pool, that is, a document specifying what 
attributes the items in a new item pool or an update of an existing pool should have. As 
will become clear below, the blueprint is designed to allow for the assembly of a 
prespecified number of test forms from the pool each with its own set of specifications. At 
the same time, it is optimal in the sense that the efforts or “costs" involved in realizing the 
item pool are minimized. A favorable consequence of this objective is that the number of 
unused items is also minimized. (In practice, it may be prudent to have a few spare items 
though; see the> discussion later in this paper.) 

The actual task of writing test items to a blueprint is difficult. The main reason for 
it does reside not so much in the content attributes of the items as well as in their 
statistical attributes, such as p-values, item-test correlations, and IRT parameters. It is a 
common experience that the values of statistical attributes of individual items are only 
loosely predictable. At the same time, however, at the level of a pool of items, statistical 
attributes often show persistent patterns of correlation with content attributes. In this paper, 
these patterns are used to derive an empirical measure for item writing efforts that is 
minimized in the design model. 

The point of view taken in this paper is that item pools are not static entities. Tests 
are assembled from the pool and subsequently released, or items may be removed from 
the pool because they become obsolete. In most testing programs, new items are therefore 
written and pretested on a continuous basis. Though presented as a method for designing a 
single pool, it is believed that in practice the models in this method will serve as tools for 
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monitoring the item writing process on a more continuous basis. The slight adaptation 
needed to use the models for item pool management is presented later in this paper. As 
will become clear below, if the models are used in this mode, possible differences between 
the statistical attributes of items in the blueprint and their actual values in the pretest are 
automatically compensated in the next application of the design models. 

The problem of item pool design has been addressed earlier in Boekkooi-Timminga 
(1991) and Stocking and Swanson (1998). The former paper also uses integer 
programming to calculate the numbers of items needed for future test forms but follows a 
sequential approach maximizing the information function of each subsequent test under the 
Rasch or one-parameter logistic model. The results are then used to improve on the 
composition of an existing item pool. The model proposed in this paper directly calculates 
a blueprint for the entire item pool (though it is sometimes efficient to do the calculations 
sequentially). In addition, its objective is minimizing the costs of actually producing the 
pool by the current item writers rather than maximizing the test information functions. At 
the same time, the model guarantees that the targets for the test information functions are 
met. Finally, this model is not restricted to items calibrated under the Rasch model. The 
paper by Stocking and Swanson does not deal with the problem of designing an item pool 
as such. Rather, it presents a method for assigning items from a master pool to a set of 
smaller pools accessed randomly in an adaptive testing program to minimize item 
exposure among examinees. 

The remainder of the paper is organized as follows: First, the problem of item pool 
design is analyzed, making an important distinction between test specifications based on 
categorical and quantitative item attributes. Then the design method is presented, and it is 
shown how its models minimize the existing costs involved in item writing. In addition, it 
is explained how the method can be used if the item pool has to support a testing program 
with sets of items related to common stimuli. The next section of the paper explains how 
the proposed models can be adapted for use as monitoring tools in item pool management. 
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Finally, an empirical application of the models to the problem of designing an item pool 
for the Law School Admission Test (LSAT) is presented. 

Analysis of Design Problem 

An important distinction between test specifications or constraints in mathematical 
programming models for test assembly is the one between constraints on categorical item 
attributes, on quantitative attributes, and constraints needed to represent inter-item 
dependencies (van der Linden, 1998). This distinction also plays a critical role in the item 
pool design model presented in this paper. 

Categorical Constraints 

The defining characteristic of a categorical item attribute, such as item content, 
cognitive level, format, author, or answer key, is that it partitions the item pool into a 
series of subsets. A test specification with respect to a constraint on a categorical attribute 
generally constrains the distribution of the items in the test over the subsets. If the items 
are coded by multiple attributes, their Cartesian product introduces a partition of the pool. 
Constraints on categorical attributes then address not only marginal distributions of items 
on attributes but also their joint and conditional distributions. 

A natural way to represent categorical attributes is by a table. An example for the 
case of two categorical constraints with a few constraints on their distributions is given in 
Table 1. One attribute is item content, C (with levels Cl, C2, and C3); the other is item 

[Insert Table 1 about here] 

format, F (with levels FI and F2). In the first panel, the full distribution of the items in 
the pool is represented by the numbers ny, , nj, and n that are the numbers of items in 
cell (i,j), row i, column j, and the total table, respectively. Likewise, in the second panel 
the numbers of items in the test are denoted as r-j, rj , r j, and r . The following set of 
constraints is imposed on the test: 

r l 2 = 6; 


O 
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Note that this set not only fixes certain numbers directly but also restricts the values 
possible for the other numbers in the table. For example, the first and last constraint 
together imply the constraint r u +r i3-2. This fact sometimes allows us to represent the 
same set of test specifications by different sets of constraints. Some of these sets may be 
smaller and therefore more efficient than others. The method in this paper, however, is 
neutral with respect to such differences. 

In a test assembly problem, values for the numbers r— are sought such that the 
constraints on all distributions are met and the combination of the values optimizes an 
objective function. In so doing, the numbers n- of items in the pool are fixed and serve as 
upper bounds to the numbers r—. The basic approach to the item pool design problem in 
this paper is to reverse the role of these two quantities. The numbers n^ are now taken as 
the decision variables and a function of them is optimized subject to all constraint sets 
involved by the specifications of the tests the pool has to support. 

Quantitative Constraints 

Examples of quantitative item attributes are word counts, exposure rates, values for 
item response theory (IRT) information functions, expected response times, and classical 
parameters as p-values and item-test correlations. Unlike categorical constraints, 
quantitative constraints do not impose bounds directly on numbers of items but on a 
function of their joint attribute values, mostly a sum or an average. 

In IRT-based test assembly, important constraints are those on the information 
function of the test. These constraints typically require the sum of values of the item 
information functions to meet certain bounds. It is important to note that though each 
combination of item information functions defines a unique test information function, the 
reverse does not hold. Unlike categorical attributes, constraints with quantitative attributes 
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have no one-one correspondence with item distributions but sets of distributions that are 
feasible with respect to the constraints. However, this property should not be viewed as a 
disadvantage. It can be exploited to choose an distribution to represent a quantitative 
constraint that is optimal with respect to an objective function. This paper follows this 
approach and translates all constraints on quantitative attributes into optimal distribution of 
the items over tables defined by a selection of their values. 

The option of choosing an objective function for the selection of an optimal item 
distribution is used to solve a problem alluded to earlier, namely the one of the difficulty 
involved in writing items with prespecified values for their statistical attributes. More 
concretely, it is proposed to use the distribution of the statistical attributes for a recent 
item pool as an indicator of the efforts or costs involved in writing items with certain 
combinations of attribute values by the item writers. In so doing, the assumption is that 
items with combinations of attribute values with higher frequencies are easier or less 
"costly” to produce. More realistic information on item writing costs is obtained if the 
joint distribution of all quantitative and categorical attributes is used. If persistent 
differences between item writers exist, further improvement is possible by choosing item 
writers as one of the (categorical) attributes defining the joint distribution. 

The idea will now be formalized for constraints on test information functions but 
can easily be generalized to constraints on other quantitative attributes (see the other 
examples later in this paper). In the empirical example below, the 3-parameter logistic 
model was used to calibrate the existing item bank: 

p |(0) = Prob{Uj=l |0 } = c j + ( 1 — cj){ 1 +exp[-a j(0 -b j)] } ” 1 , (1) 

where 0 is the unknown ability of the examinee and aj€[0,°°], bje[-°o,°o], and Cj€[0,l] 
are the discrimination, difficulty, and guessing parameter for item i, respectively (Lord, 
1980, chap. 2). First, the parameters a •, bj, and Cj are discretized, that is, their scales of 
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possible values is replaced by a grid of discrete values, (a^b^c^X with d=l,...,D grid 
points . The number of points on the grid as well as their spacing is free. Let Q be the 
table defined by the product of these grids for all quantitative attributes, with arbitrary cell 
q. The symbol C is used to represent the full table defined by the categorical attributes, 
with an arbitrary cell denoted by ceC. A cell in the joint table defined by C and Q is be 
denoted as (c,q)eCxQ. 

Let x denote the frequency of the items in cell (c,q) for a representative pool, 
cq 

These frequencies contain information on the efforts involved in writing items for the 
various cells in the table. Cells with relatively large frequencies represent combinations of 
categorical and quantitative attribute values that tend to go together often; apparently, such 
items are easy to produce. On the other hand, empty cells seem to point at combinations 
of attribute values that are difficult to produce. A monotonically decreasing function of 
x C( j, denoted as cp(x C q), will be used as an empirical measure of the efforts involved in 
writing items with the various possible combinations of attribute values. The generic term 
"cost function" will be used for this function. In the model below, the costs for writing the 
new item pool will be minimized using this function. 

A simple cost function is tp(x C q) = x c( j , which requires x C q>0. Other choices are 
possible. However, as will be obvious from the definition of the objective function in 
Equation 2 below, the choice of unit does not matter. Also, before calculating (p(x C q), it 
is recommended to collapse the CxQ table over attributes that show no substantial 
dependencies on any of the other attributes. This operation will result in larger frequencies 
and hence a more stable estimate of the distribution over CxQ. If different patterns of 
correlation exist for different item writers, the cost function can be made more realistic by 
adding item writers as a categorical attribute to the table. If this option is used, a 
constraint has to be added to the design models below on the number of items or stimuli 
to be written by each item writer. The blueprint of the new item pool then automatically 
shows which types of items have to be written by which author. 
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It should be noted that the use of a cost function defined on item writing 
practices for a recent item pool is not conservative in the sense that old practices are 
automatically continued. The new item pool can be planned freely to support any new sets 
of test specifications, and the integer programming model guarantees that test forms can 
be assembled to these specifications. The point is, however, that a potentially large set of 
item pools can be expected to be feasible for the integer programming model. The cost 
function is used only to select a solution from this set that minimizes the costs of item 
writing. 

Constraints on Interdependent Items 

The constraints in this category deal with possible relations of exclusion and 
inclusion between the items in the pool. Two items exclude each other, for example, if one 
item contains a clue to the key on the other item ("enemies"). In 0-1 LP-based test 
assembly, it is possible to constrain the test to have no more than one item from each 
known set of enemies. However, the problem of enemies in the item pool is basically a 
problem of how to distribute them over different test forms if they happen to occur. Also, 
the number of enemies in an item pool is generally low. The position taken here is 
therefore that the presence of enemies is a problem of test assembly — not of item pool 
design. It will therefore be ignored in the remainder of this paper. 

An important type of inclusion relation exists between items that are organized 
around common stimuli, for example, a reading passage in a reading comprehension test 
or a description of an experiment in a biology test. We will use "item sets" as a generic 
term for this part of a test. Typically, the items in these sets are selected from larger sets 
available in the pool. Selection of item sets often involves constraints on categorical (e.g., 
content) and quantitative (e.g., word counts) attributes for the stimuli. Several versions of 
0-1 LP models for test assembly are available to deal with pools with item sets (van der 
Linden, in preparation). 

The problem of designing a pool with item sets is solved by the following three- 
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stage procedure: 

1. First, a blueprint for a pool of items is designed using the integer 
programming model in Equations 2-6 below ignoring the item set structure. 
The model constrains the distributions of the items over their categorical 
and quantitative attributes. The objective function minimizes a cost function 
for writing the items. 

2. Next, a blueprint for a pool of stimuli for the item sets is designed using the 
same methodology as for the pool of items. The model now constrains the 
distribution of the stimuli over their categorical and quantitative attributes 
and the objective function minimizes a cost function for writing the stimuli. 

3. Finally, items are assigned to the stimuli to form item sets. The assignment 
is done using a separate integer programming model formulated for this 
task. The constraints in the model control the assignment both for the 
numbers of items available in the various cells of the CxQ table and the 
numbers required in the item sets. The objective function is of the same 
type as above; its specific form will be explained later in this paper. 

The fact that these steps are taken separately should not come as a surprise. They only 
serve to design an item pool with a set structure. Of course, if the design is realized, the 
actual stimuli and items in the sets are written simultaneously and in a coordinated 
fashion. 

Models for Item Pool Design 

In this section the various models announced above will be explained. First, the 
model for designing the pool of items is presented. Then the case of item sets is addressed 
and the models for designing a pool of stimuli and assigning items to stimuli are 

\ 

explained. 

Pool of Items 

The following notation is needed to present the model. Index f=l,...,F will be used 
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to represent the individual test forms the item pool should support. Still, the symbols C 
and Q will be used to represent the tables defined by the categorical and quantitative 
attributes, respectively. As an example of a constraint on a quantitative attribute the 
information function of test form f will be required to approach a set of target values 
TfOfc), k=l,...,K, from above. Using this attribute for the model in Equation 1 implies a 
three-dimensional table Q, with one dimension for each item parameter in (1). The 
information on 0 in a response to an item in cell qeQ will be denoted as lq(0); this 
quantity is calculated, for example, substituting the midpoints of the intervals of the item- 
parameter values defining cell q of the table. The decision variables in the model are 
integer variables n fcq . These variables represent the number of items in cell (c,q) needed 
in the pool to support form f. The complete pool is thus defined by the numbers E n fcq- 


f=l 


The cost function is still denoted as cp C q. 
The model is as follows: 



(minimizing costs) (2) 


subject to 


Z Z l q (0 k )n fcq > Tf(0 k ). f=l F, k=l,...,K, 


(test information) (3) 


c q 


I I n fcq > n fg , f=l,...,F, g=l,...,G, 
ceV fg q 


(categorical constraints) (4) 


Z Z n fcq > n f , f=l F, 


(length of forms) (5) 


c q 


n fcq = 0,1,..., f=l,...,F, ceC, qeQ. 


(integer variables) (6) 


The objective function in Equation 2 minimizes the sum of the item writing costs across 


Design of Item Pools 13 


all items in the F forms. For each form the constraints in Equation 3 require the 
information function to be larger than the target values at 0^, k=l,...,K. The objective 
function in Equation 2 guarantees that these bounds are approached from above. The 
categorical constraints imposed on the forms are formulated in Equation 4. The sets Vfg, 
f=l,...,F and g=l,...,G, are the sets of cells in C on which the constraints have to be 
imposed. For example, in the test form in Table 1 the first constraint is imposed on the set 
of cells consists only of cell (1,1), the second on the set of cells in Row 1, and the third 
on the set of cells in Column 1. Lower bounds nfg are set only; the objective function in 
Equation 2 guarantees that the constraints are satisfied as an equality at optimality. The 
same happens to the constraints on the lengths of the forms in Equation 5. 

Other quantitative constraints can be added to the model following the same logic 
as in Equation 3. Unlike LP models for simultaneous assembly with decision variables for 
the individual items, in the current model it is not necessary to prevent the same item 
from being assigned to more than one form. Thus the large number of extra constraints 
required to preclude item overlap (Boekkooi-Timminga, 1987; van der Linden & Adema, 
1998) need not bother us here. 

For smaller CxQ tables, the optimal values of the variables in the model can be 
calculated using one of the available implementations of the branch-and-bound algorithm 
for integer programming. For larger tables optimal values can always be obtained relaxing 
the model and using the simplex algorithm. The simplex algorithm is capable of handling 
problems with thousands of variables in a small amount of time (Wagner, 1978). 

Fractional values in the solution can then be rounded upwardly; as already noted, it is 
always prudent to have a few spare items in the pool. 

Pool of Stimuli 

Tables C' and Q' are now defined for the sets of categorical and quantitative 
attributes used to describe the stimuli in the test forms the item pool has to support. Since 
psychometric attributes for stimuli are rare, table Q' is expected to be much smaller than 
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Q. Item sets do often have aggregated statistical attributes, such as sums of p-values or 
average b-s. However, these aggregates belong to the set of items associated with a 
stimulus— not to the stimulus itself. Constraints on such aggregated attributes are dealt with 
in the item assignment model below. 

The model is analogous to that in Equations 2-6. The cost function <Pc'q' IS now 
defined for the distribution of stimuli in the previous item pool. Likewise, the bounds in 
Equations 3-4 are derived from the specifications for the item sets in the various test 
forms. As an example of a quantitative attribute for stimuli, word counts are used. Let 
Wp' be the number of words for a stimulus in cell q and Wj- the target for the number of 
words for a stimulus in form f. The constraints needed are: 

I I Wq'ftfc'q' ^ w f> f=L— ,F. (word counts) (7) 

c q 

Again, because of minimization in Equation 2, the bounds in the constraints in Equation 7 
are approximated from above and serve as targets for the number of words per stimulus. 

The output from the model is an optimal array of frequencies rif c 'q' for form f. 

The blueprint for the complete pool of stimuli is determined by the numbers E nf c ' Q '- 

f 

Assigning Items to Stimuli 

To introduce the item assignment model, index s^=l,...,S^ is defined to denote the 
item sets in form f=l,...,F. Each of these sets is associated with one stimulus, that is, one 
of the cells (c',q'). The total number of stimuli associated with the cells satisfy the 
optimal numbers nfc'q' from the model in the previous section. The association is 
arbitrary and assumed to be made prior to the item assignment. For each set the attribute 
values of its stimulus are thus assumed to be known; however, for notational convenience, 
the dependence of s^ on (c',q') will remain implicit in this paper. 

In addition, integer decision variables z s ^q are defined to denote the number of 
items from cell (c,q) in the item table assigned to set s^ Finally, a cost function <P C q C 'q' 
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is defined on the Cartesian product of the tables CxQ and C'xQ'. This function reflects 
the costs of writing an item with attributes (c,q) for a stimulus with attributes (c',q'). 
The item assignment model is as follows: 


minimize I I I I <P C qc q' z SfCq 
f Sf c q 


subject to 


(minimizing costs) (8) 


^ z Sfcq - n s f 

c,q 


Z S z Sf cq - n cq ’ ceC ’ 

f S ,f 

z S{C q = 0,1 Sf=l S f , f=l F, ceC, qeQ. 


(# of items needed) (9) 


(# of items available) (10) 


(integer variables) (11) 


The constraints in Equation 9 assign n s ^ items to item set Sf whereas the constraints in 
Equation 10 ensure that no more items are assigned from cell (c,g) than available in the 
blueprint for the item pool. 

If constraints on aggregated quantitative item attributes have to be imposed on 
some of the item sets, the model has to be expanded. For example, if item set Sf has to 


.(0 


(u) 


have an average p-value between lower and upper bounds p and p , respectively, the 

Sf Sf 

following two constraints should be added to the model: 


Pq z Sfcq - n s f P Sf ’ Sf-l.—tSf, f— 1 

(lower bounds on p) 

(12) 

Pq z Sfcq s "sfPsJ 0 ’ s r* S f’ f=1 « ’ F 

(upper bound on p) 

(13) 
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Models for Item Pool Management 

As already indicated, item pools are not static entities. In most testing programs, 
tests are assembled from the pool and new items are pretested on a continuous basis. 

Hence, two important tasks of item pool management are: (1) monitoring the 
developments in the item pool; and (2) instructing item writers to write new items to 
complete the pool. 

The models in this paper can easily be adapted for use in item pool management. 
The only thing needed is to correct the decision variables in the models for the numbers of 
items and stimuli currently available in the pool. The principle is illustrated for the model 
in Equations 2-6. Let v C q be a constant representing the current number of items in cell 
(c,q) in the pool and rj C q a new decision variable denoting the number of items to be 
written for this cell. The only adaptation necessary is substituting v C q+T| C q for the old 
decision variables in the model. 

If the current items in the pool reveal new patterns of correlation between 
categorical and quantitative attributes, the cost functions (p C q can be updated by defining 
them on v C q rather than the frequencies x C q for the previous item pool, or perhaps on a 
weighted combination of both. This practice is recommended, for example, if the item 
writers form a categorical attribute in the definition of table QxC and new item writers 
have been hired. 

Empirical Example 

The method in this paper was used to design a new item pool for the Law School 
Admission Test (LSAT). The purpose of this study was only to illustrate the procedure. 

For security reasons, the exact item attributes and stimulus in this study are not revealed. 

The pool has to support three different sections in the test, labeled here as A, B 
and C. The first two sections have items organized as sets with a common stimulus; the 
last section consists of discrete items. The three sections in the test are assembled to meet 
large sets of constraints on their composition dealing with such features as item 
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(sub)types, item-set structures in the pool, types of stimuli, gender and minority orientation 
of the stimuli, answers key distributions, word counts, and test information functions. As 
the authors of the items could not be identified, author was not used as an attribute in the 
analyses of the pool. The numbers of attributes for each section are given in Table 2. 

[Insert Table 2 about here] 

The item pool was designed to support ten regular forms of the LSAT, ten forms 
with a target for the test information function shifted 0.6 to the left on the 0 continuum, 
and ten with a target shifted 0.6 to the right. The (integer) decision variables in these 
constraints represented the frequencies needed for the cells in the full attribute tables, 

CxQ. The numbers of decision variables and constraints in the models for the item pool, 
stimulus pool, and assignment of the items to the stimuli are given in the fourth column of 
Table 2. 

A previous pool of 5,316 items was available to define cost functions for the 

models. The functions were defined as <p (x__)= x * , with an arbitrary large value 

v -4 cq 

substituted for empty cells in the table. The items in the pool were calibrated using the 3- 
PL model in Equation 1. Before estimating the costs, the attribute tables were reduced in 
size. Since the values of the items for the guessing parameter in the model, c- , did not 
vary much, this parameter was left out as a dimension of the attributes table CxQ, using 
its average value to calculate the information function values in the models. Also, the 
values of the aj and bj parameters were grouped into 8 and 10 intervals, respectively, 
using the midpoints of the intervals to calculate the information function values. Further, 
attributes that did not show any correlation with the item parameter values were identified, 
and the table was collapsed over these attributes. Finally, neighboring values of the same 
attribute with approximately the same (conditional) distribution were grouped. The purpose 
of all grouping and collapsing was to reduce the number of cells in the table and get more 
stable estimates of the frequencies in the cost function. Cells that are collapsed in the 
attribute tables received a value for the cost function based on their marginal frequencies. 
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As the three sections in the LSAT have no overlap in items, three independent 
models had to be solved. The numbers of constraints and decision variables in the integer- 
programming models are given in Table 2. The three sets of test specifications the pool 
was assumed to support involved no constraints on interdependent items between them. 
Also, the objective functions in the models are sums of costs across these sets which are 
minimal if the costs for each set are minimal. The models could therefore be solved 
independently for each set. 

The best strategy to solve models of the size in the current application is through 
the simplex algorithm for the relaxed version of the models, that is, with the integer 
variables replaced by (nonnegative) real-valued variables, rounding the solutions upwardly. 
The simplex algorithm as implemented in the Consolve module in the test assembly 
software package ConTEST was used (Timminga, van der Linden & Schweitzer, 1996). 
The solution times for all models was approximately 1 second of CPU time on a PC. No 
rounding appeared to be necessary; the algorithm found a direct integer solution for all 
variables. This result happened because the matrix of coefficients for the models appeared 
to have a unimodular structure (for this property, see Nemhauser & Wolsey, 1988, chap. 

II. 3). This feature does not generalize automatically to other applications of the integer 
programming models for item pool design in this paper. However, rounding the variables 
upwardly is a simple but effective strategy that always works. 

Concluding Remarks 

The previous example simulated how the integer programming models in this paper 
can be used to calculate blueprints for new item pools to support a testing program. 
Although not illustrated in the example, only a minor correction to the decision variables 
is necessary to use the models as tools for managing item pools in an ongoing program. 
The same correction can be used to cope with changes in test specifications and/or 
objective functions for the tests. 

A case not dealt with in this paper is the one in which the tests assembled from the 
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pool are allowed to have item overlap. If overlap is allowed, the number of possible tests 
from a given pool goes up. Determining how many different tests are possible under this 
condition involves a complicated combinatorial problem (Theunissen, 1996). If the overlap 
is small, a wise strategy might be to just ignore this possibility, the result being an item 
pool somewhat too large but allowing for all planned tests to be assembled. However, the 
problem of how to design an item pool of minimal size to support tests for which large 
overlap is allowed remains still to be solved. 
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Table 1 

Distribution of items in the pool and constrained distribution 
of items in a test form (case of two categorical attributes) 
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. Table 2 

Number of item attributes, constraints, and decision variables for the three 
sections of the LSAT 


Model 

#Attributes 

Constraints 

#Decision Variables 

Section A 
Item Pool 

5 

70 

1920 

Stimulus Pool 

4 

1 

8 

Assignment 

* 

4 

24 

Section B 




Item Pool 

6 

97 

6144 

Stimulus Pool 

4 

6 

4 

31 

Assignment 

* 

10 

12 

Section C 
Item Pool 

5 

65 

11520 


Note. Cells with are not applicable. 
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